Applied Metaphors: Learning TRIZ, Complexity, Data/Stats/ML using Metaphors
  1. πŸ•” Time Series
  • Teaching
    • Data Viz and Analytics
      • Tools
        • Introduction to R and RStudio
        • Introduction to Radiant
        • Introduction to Orange
      • Descriptive Analytics
        • Data
        • Graphs
        • Summaries
        • Counts
        • Quantities
        • Groups
        • Densities
        • Groups and Densities
        • Change
        • Proportions
        • Parts of a Whole
        • Evolution and Flow
        • Ratings and Rankings
        • Surveys
        • Time
        • Space
        • Networks
        • Experiments
        • Miscellaneous Graphing Tools, and References
      • Statistical Inference
        • 🧭 Basics of Statistical Inference
        • 🎲 Samples, Populations, Statistics and Inference
        • Basics of Randomization Tests
        • πŸƒ Inference for a Single Mean
        • πŸƒ Inference for Two Independent Means
        • πŸƒ Inference for Comparing Two Paired Means
        • Comparing Multiple Means with ANOVA
        • Inference for Correlation
        • πŸƒ Testing a Single Proportion
        • πŸƒ Inference Test for Two Proportions
      • Inferential Modelling
        • Modelling with Linear Regression
        • Modelling with Logistic Regression
        • πŸ•” Modelling and Predicting Time Series
      • Predictive Modelling
        • πŸ‰ Intro to Orange
        • ML - Regression
        • ML - Classification
        • ML - Clustering
      • Prescriptive Modelling
        • πŸ“ Intro to Linear Programming
        • πŸ’­ The Simplex Method - Intuitively
        • πŸ“… The Simplex Method - In Excel
      • Workflow
        • Facing the Abyss
        • I Publish, therefore I Am
      • Using AI in Analytics
        • Case Studies
          • Demo:Product Packaging and Elderly People
          • Ikea Furniture
          • Movie Profits
          • Gender at the Work Place
          • Heptathlon
          • School Scores
          • Children's Games
          • Valentine’s Day Spending
          • Women Live Longer?
          • Hearing Loss in Children
          • California Transit Payments
          • Seaweed Nutrients
          • Coffee Flavours
          • Legionnaire’s Disease in the USA
          • Antarctic Sea ice
          • William Farr's Observations on Cholera in London
      • TRIZ for Problem Solvers
        • I am Water
        • I am What I yam
        • Birds of Different Feathers
        • I Connect therefore I am
        • I Think, Therefore I am
        • The Art of Parallel Thinking
        • A Year of Metaphoric Thinking
        • TRIZ - Problems and Contradictions
        • TRIZ - The Unreasonable Effectiveness of Available Resources
        • TRIZ - The Ideal Final Result
        • TRIZ - A Contradictory Language
        • TRIZ - The Contradiction Matrix Workflow
        • TRIZ - The Laws of Evolution
        • TRIZ - Substance Field Analysis, and ARIZ
      • Math Models for Creative Coders
        • Maths Basics
          • Vectors
          • Matrix Algebra Whirlwind Tour
          • content/courses/MathModelsDesign/Modules/05-Maths/70-MultiDimensionGeometry/index.qmd
        • Tech
          • Tools and Installation
          • Adding Libraries to p5.js
          • Using Constructor Objects in p5.js
        • Geometry
          • Circles
          • Complex Numbers
          • Fractals
          • Affine Transformation Fractals
          • L-Systems
          • Kolams and Lusona
        • Media
          • Fourier Series
          • Additive Sound Synthesis
          • Making Noise Predictably
          • The Karplus-Strong Guitar Algorithm
        • AI
          • Working with Neural Nets
          • The Perceptron
          • The Multilayer Perceptron
          • MLPs and Backpropagation
          • Gradient Descent
        • Projects
          • Projects
      • Tech for Creative Education
        • 🧭 Using Idyll
        • 🧭 Using Apparatus
        • 🧭 Using g9.js
      • Literary Jukebox: In Short, the World
        • Italy - Dino Buzzati
        • France - Guy de Maupassant
        • Japan - Hisaye Yamamoto
        • Peru - Ventura Garcia Calderon
        • Russia - Maxim Gorky
        • Egypt - Alifa Rifaat
        • Brazil - Clarice Lispector
        • England - V S Pritchett
        • Russia - Ivan Bunin
        • Czechia - Milan Kundera
        • Sweden - Lars Gustaffsson
        • Canada - John Cheever
        • Ireland - William Trevor
        • USA - Raymond Carver
        • Italy - Primo Levi
        • India - Ruth Prawer Jhabvala
        • USA - Carson McCullers
        • Zimbabwe - Petina Gappah
        • India - Bharati Mukherjee
        • USA - Lucia Berlin
        • USA - Grace Paley
        • England - Angela Carter
        • USA - Kurt Vonnegut
        • Spain-Merce Rodoreda
        • Israel - Ruth Calderon
        • Israel - Etgar Keret
    • Posts
    • Blogs and Talks

    On this page

    • Setting up R Packages
    • Introduction
    • Time Series Formats, Conversion, and Plotting
      • Base-R ts format data
      • tibble data
      • tsibble data
    • Time-Series Wrangling
    • Candle-Stick Plots
      • Monthly Box Plots
      • Quarterly boxplots
      • Yearwise boxplots
    • Conclusion
    • Your Turn
    • References
    • Readings
    • Extra Stuff
    • References

    πŸ•” Time Series

    Time Series

    CandleStick Graphs
    Heatmap Graphs (over time)
    Line Graphs
    Time Series
    Author

    Arvind V

    Published

    December 15, 2022

    Modified

    July 17, 2025

    Abstract
    Events, Trends, Seasons, and Changes over Time

    Setting up R Packages

    library(tidyverse)
    library(mosaic)
    library(ggformula) # Our Formula based graphing package
    
    # Wrangling
    library(lubridate) # Deal with dates. Part of the tidyverse anyway!
    
    library(fpp3) # Robert Hyndman's textbook package, Loads all the core time series packages, see messages
    
    # Plots
    library(timetk) # Tidy Time series analysis and plots
    library(tsbox) # Plotting and Time Series File Transformations
    # library(TSstudio) # Plots, Decomposition, and Modelling with Time Series.
    # Seems hard to get to work in Quarto ;-()
    library(timetk) # Visualizing, Wrangling and Modelling Time Series by Matt Dancho
    
    # Modelling
    library(sweep) # New (07/2023) package to bring broom-like features to time series models
    
    # devtools::install_github("FinYang/tsdl")
    library(tsdl) # Time Series Data Library from Rob Hyndman
    Tipmosaic and ggformula command template

    Note the standard method for all commands from the mosaic and ggformula packages: goal( y ~ x | z, data = _____)

    With ggformula, one can create any graph/chart using: gf_***(y ~ x | z, data = _____)

    In practice, we often use: dataframe %>% gf_***(y ~ x | z) which has cool benefits such as β€œautocompletion” of variable names, as we shall see. The β€œ***” indicates what kind of graph you desire: histogram, bar, scatter, density; the β€œ___” is the name of your dataset that you want to plot with.

    Tipggplot command template

    The ggplot2 template is used to identify the dataframe, identify the x and y axis, and define visualized layers:

    ggplot(data = ---, mapping = aes(x = ---, y = ---)) + geom_----()

    Note: β€”- is meant to imply text you supply. e.g. function names, data frame names, variable names.

    It is helpful to see the argument mapping, above. In practice, rather than typing the formal arguments, code is typically shorthanded to this:

    dataframe %>% ggplot(aes(xvar, yvar)) + geom_----()

    Introduction

    Any metric that is measured over regular time intervals forms a time series. Analysis of Time Series is commercially important because of industrial need and relevance, especially with respect to Forecasting (Weather data, sports scores, population growth figures, stock prices, demand, sales, supply…). For example, in the graph shown below are the temperatures over time in two US cities:

    What can we do with Time Series? As with other datasets, we have to begin by answering fundamental questions, such as:

    1. What are the types of time series?
    2. How do we visualize time series?
    3. How might we summarize time series to get aggregate numbers, say by week, month, quarter or year?
    4. How do we decompose the time series into level, trend, and seasonal components?
    5. Hoe might we make a model of the underlying process that creates these time series?
    6. How do we make useful forecasts with the data we have?

    We will first look at the multiple data formats for time series in R. Alongside we will look at the R packages that work with these formats and create graphs and measures using those objects. Then we examine data wrangling of time series, where we look at packages that offer dplyr-like ability to group and summarize time series using the time variable. We will finally look at obtaining the components of the time series and try our hand at modelling and forecasting.

    Time Series Formats, Conversion, and Plotting

    There are multiple formats for time series data. The ones that we are likely to encounter most are:

    • The ts format: We may simply have a single series of measurements that are made over time, stored as a numerical vector. The stats::ts() function will convert a numeric vector into an R time series ts object, which is the most basic time series object in R. The base-R ts object is used by established packages forecast and is also supported by newer packages such as tsbox.

    • The tibble format: the simplest and most familiar data format is of course the standard tibble/data frame, with or without an explicit time column/variable to indicate that the other variables vary with time. The standard tibble object is used by many packages, e.g. timetk & modeltime.

    • The modern tsibble format: this is a new modern format for time series analysis. The special tsibble object (β€œtime series tibble”) is used by fable, feasts and others from the tidyverts set of packages.

    There are many other time-oriented data formats too…probably too many, such a tibbletime and TimeSeries objects. For now the best way to deal with these, should you encounter them, is to convert them (Using tsbox) to a tibble or a tsibble and work with these.

    Standards

    Standards

    To start, we will use simple ts data first, and then do another with tibble format that we can plot as is. We will then do more after conversion to tsibble format, and then a third example with a ground-up tsibble dataset.

    Base-R ts format data

    There are a few datasets in base R that are in ts format already.

    AirPassengers
         Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
    1949 112 118 132 129 121 135 148 148 136 119 104 118
    1950 115 126 141 135 125 149 170 170 158 133 114 140
    1951 145 150 178 163 172 178 199 199 184 162 146 166
    1952 171 180 193 181 183 218 230 242 209 191 172 194
    1953 196 196 236 235 229 243 264 272 237 211 180 201
    1954 204 188 235 227 234 264 302 293 259 229 203 229
    1955 242 233 267 269 270 315 364 347 312 274 237 278
    1956 284 277 317 313 318 374 413 405 355 306 271 306
    1957 315 301 356 348 355 422 465 467 404 347 305 336
    1958 340 318 362 348 363 435 491 505 404 359 310 337
    1959 360 342 406 396 420 472 548 559 463 407 362 405
    1960 417 391 419 461 472 535 622 606 508 461 390 432
    str(AirPassengers)
     Time-Series [1:144] from 1949 to 1961: 112 118 132 129 121 135 148 148 136 119 ...

    This can be easily plotted using base R and other more recent packages:

    # Base R
    plot(AirPassengers)
    # tsbox static plot
    tsbox::ts_plot(AirPassengers, ylab = "Passengers")

    One can see that there is an upward trend and also seasonal variations that also increase over time. This is an example of a multiplicative time series, which we will discuss later.

    Let us take data that is β€œtime oriented” but not in ts format. We use the command ts to convert a numeric vector to ts format: the syntax of ts() is:

    Syntax: objectName <- ts(data, start, end, frequency), where,

    • data : represents the data vector
    • start : represents the first observation in time series
    • end : represents the last observation in time series
    • frequency : represents number of observations per unit time. For example 1=annual, 4=quarterly, 12=monthly, 7=weekly, etc.

    We will pick simple numerical vector data ( i.e. not a time series ) ChickWeight:

    ChickWeight %>% head()
    # Filter for Chick #1 and for Diet #1
    ChickWeight_ts <- ChickWeight %>%
      filter(Chick == 1, Diet == 1) %>%
      select(weight, Time)
    
    ChickWeight_ts <- stats::ts(ChickWeight_ts$weight, frequency = 2)
    str(ChickWeight_ts)
     Time-Series [1:12] from 1 to 6.5: 42 51 59 64 76 93 106 125 149 171 ...

    Now we can plot this in many ways:

    plot(ChickWeight_ts) # Using base-R
    # ts_boxable(ChickWeight_ts)
    # Using tsbox
    tsbox::ts_plot(ChickWeight_ts,
      ylab = "Weight of Chick #1"
    )

    # Using TSstudio
    TSstudio::ts_plot(ChickWeight_ts,
      Xtitle = "Time",
      Ytitle = "Weight of Chick #1"
    )

    We see that the weights of a young chick specimen increases over time.

    tibble data

    The ts data format can handle only one time series. If we want multiple time series, based on say Qualitative variables, we need other data formats. Using the familiar tibble structure opens up new possibilities.

    • We can have multiple time series within a tibble (think of numerical time-series data like GDP, Population, Imports, Exports for multiple countries as with the gapminder1data we saw earlier).
    • It also allows for data processing with dplyr such as filtering and summarizing.

    gapminder data

    Let us read and inspect in the US births data from 2000 to 2014. Download this data by clicking on the icon below, and saving the downloaded file in a sub-folder called data inside your project.

    Read this data in:

    births_2000_2014 <- read_csv("../data/US_births_2000-2014_SSA.csv")
    glimpse(births_2000_2014)
    Rows: 5,479
    Columns: 5
    $ year          <dbl> 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 20…
    $ month         <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
    $ date_of_month <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
    $ day_of_week   <dbl> 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3,…
    $ births        <dbl> 9083, 8006, 11363, 13032, 12558, 12466, 12516, 8934, 794…
    inspect(births_2000_2014)
    
    quantitative variables:  
               name   class  min   Q1 median    Q3   max         mean          sd
    1          year numeric 2000 2003   2007  2011  2014  2006.999270    4.321085
    2         month numeric    1    4      7    10    12     6.522723    3.449075
    3 date_of_month numeric    1    8     16    23    31    15.730243    8.801151
    4   day_of_week numeric    1    2      4     6     7     3.999817    2.000502
    5        births numeric 5728 8740  12343 13082 16081 11350.068261 2325.821049
         n missing
    1 5479       0
    2 5479       0
    3 5479       0
    4 5479       0
    5 5479       0
    births_2000_2014

    This is just a tibble containing a single data variable births that varies over time. All other variables, although depicting time, are numerical columns. There are no Qualitative variables (yet!).

    Plotting tibble time series

    • Using ggformula
    • Using tsbox and TSstudio
    • Using ggplot

    We will now plot this using ggformula. Using the separate year/month/week and day_of_week / day_of_month columns, we can plot births over time, colouring by day_of_week, for example:

    # grouping by day_of_week
    births_2000_2014 %>%
      gf_line(births ~ year,
        group = ~day_of_week,
        color = ~day_of_week
      ) %>%
      gf_point(title = "By Day of Week") %>%
      gf_theme(scale_colour_distiller(palette = "Paired"))
    # Grouping by date_of_month
    births_2000_2014 %>%
      gf_line(births ~ year,
        group = ~date_of_month,
        color = ~date_of_month
      ) %>%
      gf_point(title = "By Date of Month") %>%
      gf_theme(scale_colour_distiller(palette = "Paired"))

    Not particularly illuminating. This is because the data is daily and we have considerable variation over time, and here we have too much data to visualize. Summaries will help, so we could calculate the the mean births on a month basis in each year and plot that:

    births_2000_2014_monthly <- births_2000_2014 %>%
      # Convert month to factor/Qual variable!
      # So that we can have discrete colours for each month
      # Using base::factor()
      # Could use forcats::as_factor() also
      mutate(month = base::factor(month, labels = month.abb)) %>%
      # `month.abb` is a built-in dataset containing names of months.
      group_by(year, month) %>%
      summarise(mean_monthly_births = mean(births, na.rm = TRUE))
    births_2000_2014_monthly
    births_2000_2014_monthly %>%
      gf_line(mean_monthly_births ~ year,
        group = ~month,
        colour = ~month, linewidth = 1
      ) %>%
      gf_point(size = 1.5, title = "Summaries of Monthly Births over the years") %>%
      # palette for 12 colours
      gf_theme(scale_colour_brewer(palette = "Paired"))

    Note

    These are graphs for the same month each year: we have a January graph and a February graph and so on. So…average births per month were higher in all months during 2005 to 2007 and have dropped since.

    We can do similar graphs using day_of_week as our basis for grouping, instead of month:

    births_2000_2014_weekly <- births_2000_2014 %>%
      mutate(day_of_week = base::factor(day_of_week,
        levels = c(1, 2, 3, 4, 5, 6, 7),
        labels = c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun")
      )) %>%
      group_by(year, day_of_week) %>%
      summarise(mean_daily_births = mean(births, na.rm = TRUE))
    births_2000_2014_weekly
    births_2000_2014_weekly %>%
      gf_line(mean_daily_births ~ year,
        group = ~day_of_week,
        colour = ~day_of_week,
        linewidth = 1,
        data = .
      ) %>%
      gf_point(size = 2) %>%
      # palette for 12 colours
      gf_theme(scale_colour_brewer(palette = "Paired"))

    NoteWhy are fewer babies born on weekends?

    Looks like an interesting story here…there are significantly fewer births on average on Sat and Sun, over the years! Why? Should we watch Grey’s Anatomy ?

    Important

    Note that this is still using just tibble data, without converting it or using it as a time series. So far we are simply treating the year/month/day variables are simple variables and using dplyr to group and summarize. We have not created an explicit time or date variable.

    Let us create a time variable in our dataset now:

    • tsbox::ts_plot needs just the date and the births columns to plot with and not be confused by the other numerical columns, so let us create a single date column from these three, but retain them for now.
    • TSstudio::ts_plot also needs a date column.

    So there are several numerical variables for year, month, and day_of_month, day_of_week, and of course the births on a daily basis.

    We use the lubridate package from the tidyverse:

    births_timeseries <-
      births_2000_2014 %>%
      mutate(date = lubridate::make_date(
        year = year,
        month = month,
        day = date_of_month
      )) %>%
      select(date, births, year, month, date_of_month, day_of_week)
    
    births_timeseries
    TipExtract from help(tsbox)

    In data frames, i.e., in a data.frame, a data.table, or a tibble, tsbox stores one or multiple time series in the β€˜long’ format. tsbox detects a value, a time column, and zero, one or several id columns. Column detection is done in the following order:

    • Starting on the right, the first first numeric or integer column is used as value column.
    • Using the remaining columns and starting on the right again, the first Date, POSIXct, numeric or character column is used as time column. character strings are parsed by anytime::anytime(). The timestamp, time, indicates the beginning of a period.
    • All remaining columns are id columns. Each unique combination of id columns points to a (unique) time series.

    Alternatively, the time column and the value column to be explicitly named as time and value. If explicit names are used, the column order will be ignored. If columns are detected automatically, a message is returned.

    Plotting this directly, after selecting the relevant variables, so that they will be auto-detected:

    births_timeseries %>%
      select(date, births) %>%
      tsbox::ts_plot()
    [time]: 'date' [value]: 'births' 

    births_timeseries %>%
      select(date, births) %>%
      TSstudio::ts_plot(
        Xtitle = "Year",
        Ytitle = "Births",
        title = "Births Time Series",
        Xgrid = TRUE, Ygrid = TRUE,
        slider = TRUE,
        width = 1
      ) # linewidth

    Quite messy, as before. We need use the summarised data, as before. We will do this in the next section.

    We will now plot this using ggplot for completeness. Using the separate year/month/week and day_of_week / day_of_month columns, we can plot births over time, colouring by day_of_week, for example:

    # grouping by day_of_week
    births_2000_2014 %>%
      ggplot(aes(year, births,
        group = day_of_week,
        color = day_of_week
      )) +
      geom_line() +
      geom_point() +
      labs(title = "By Day of Week") +
      scale_colour_distiller(palette = "Paired")
    # Grouping by date_of_month
    births_2000_2014 %>% ggplot(aes(year, births,
      group = date_of_month,
      color = date_of_month
    )) +
      geom_line() +
      geom_point() +
      labs(title = "By Date of Month") +
      scale_colour_distiller(palette = "Paired")

    births_2000_2014_monthly <- births_2000_2014 %>%
      # Convert month to factor/Qual variable!
      # So that we can have discrete colours for each month
      # Using base::factor()
      # Could use forcats::as_factor() also
    
      mutate(month = base::factor(month, labels = month.abb)) %>%
      # `month.abb` is a built-in dataset containing names of months.
    
      group_by(year, month) %>%
      summarise(mean_monthly_births = mean(births, na.rm = TRUE))
    births_2000_2014_monthly
    ###
    births_2000_2014_monthly %>%
      ggplot(aes(year, mean_monthly_births,
        group = month, colour = month
      )) +
      geom_line(linewidth = 1) +
      geom_point(size = 1.5) +
      labs(title = "Summaries of Monthly Births over the years") +
    
      # palette for 12 colours
      scale_colour_brewer(palette = "Paired")

    births_2000_2014_weekly <- births_2000_2014 %>%
      mutate(day_of_week = base::factor(day_of_week,
        levels = c(1, 2, 3, 4, 5, 6, 7),
        labels = c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun")
      )) %>%
      group_by(year, day_of_week) %>%
      summarise(mean_daily_births = mean(births, na.rm = TRUE))
    births_2000_2014_weekly
    births_2000_2014_weekly %>%
      ggplot(aes(year, mean_daily_births,
        group = day_of_week,
        colour = day_of_week
      )) +
      geom_line() +
      geom_point() +
    
      # palette for 12 colours
      scale_colour_brewer(palette = "Paired")

    tsibble data

    Finally, we have tsibble (β€œtime series tibble”) format data, which contains three main components:

    • an index variable that defines time;
    • a set of key variables, usually categorical, that define sets of observations, over time. This allows for each combination of the categorical variables to define a separate time series.
    • a set of quantitative variables, that represent the quantities that vary over time (i.e index)

    Here is Robert Hyndman’s video introducing tsibbles:

    The package tsibbledata contains several ready made tsibble format data. Let us try PBS, which is a dataset containing Monthly Medicare prescription data in Australia.

    Run data(package = "tsibbledata") in your Console to find out about these.
    data("PBS")
    # inspect(PBS) # does not work since mosaic cannot handle tsibbles
    PBS

    Data Description: This is a large-ish dataset:

    Run PBS in your console
    • 67K observations
    • 336 combinations of key variables (Concession, Type, ATC1, ATC2) which are categorical, as foreseen.
    • Data appears to be monthly, as indicated by the 1M.
    • the time index variable is called Month, formatted as yearmonth, a new type of variable introduced in the tsibble package

    Note that there are multiple Quantitative variables (Scripts,Cost), each sliced into 336 time-series, a feature which is not supported in the ts format, but is supported in a tsibble. The Qualitative Variables are described below.

    Type help("PBS") in your Console.

    The data is dis-aggregated/grouped using four keys:
    - Concession: Concessional scripts are given to pensioners, unemployed, dependents, and other card holders
    - Type: Co-payments are made until an individual’s script expenditure hits a threshold ($290.00 for concession, $1141.80 otherwise). Safety net subsidies are provided to individuals exceeding this amount.
    - ATC1: Anatomical Therapeutic Chemical index (level 1). 15 types
    - ATC2: Anatomical Therapeutic Chemical index (level 2). 84 types, nested inside ATC1.

    Let us simply plot Cost over time:

    • Using ggformula
    • Using ggplot
    PBS %>%
      gf_point(Cost ~ Month, data = .) %>%
      gf_line(title = "PBS Costs vs time")

    PBS %>% ggplot(aes(Month, Cost)) +
      geom_point() +
      geom_line() +
      labs(title = "PBS Costs vs time")

    This basic plot is quite messy, and it is now time (sic!) for us to look at summaries of the data using dplyr-like verbs.

    Time-Series Wrangling

    We have now arrived at the need to filter, group, and summarize time-series data. We can do this in two ways, with two packages:

    Tiptsibble has dplyr-like functions

    Using tsibble data, the tsibble package has specialized filter and group_by functions to do with the index (i.e time) variable and the key variables, such as index_by() and group_by_key().

    Filtering based on Qual variables can be done with dplyr. We can use dplyr functions such as group_by, mutate(), filter(), select() and summarise() to work with tsibble objects.

    Tiptimetk also has dplyr-like functions!

    Using tibbles, timetk provides functions such as summarize_by_time, filter_by_time and slidify that are quite powerful. Again, as with tsibble, dplyr can always be used for other variables (i.e non-time).

    Let us first see how many observations there are for each combo of keys:

    PBS %>%
      count()
    # Grouped Counts
    PBS %>%
      tsibble::group_by_key(ATC1, ATC2, Concession, Type) %>%
      dplyr::count()
    # dplyr grouping
    PBS %>%
      dplyr::group_by(ATC1, ATC2) %>%
      dplyr::count()

    We have 336 combinations of Qualitative variables, each combo containing 204 observations (except some! Take a look!): so let us filter for a few such combinations and plot:

    # Costs
    PBS %>%
      tsibble::group_by_key(ATC1, ATC2, Concession, Type) %>%
      gf_line(Cost ~ Month,
        colour = ~Type,
        data = .
      ) %>%
      gf_point(title = "Costs, per Month")
    # Scripts
    PBS %>%
      tsibble::group_by_key(ATC1, ATC2, Concession, Type) %>%
      gf_line(Scripts ~ Month,
        colour = ~Type,
        data = .
      ) %>%
      gf_point(title = "Scripts, per Month")
    # Costs variable for a specific combo of Qual variables(keys)
    PBS %>%
      dplyr::filter(
        Concession == "General",
        ATC1 == "A",
        ATC2 == "A10"
      ) %>%
      gf_line(Cost ~ Month,
        colour = ~Type,
        data = .
      ) %>%
      gf_point(title = "Costs, per Month for General/A/A10 category patients")
    # Scripts variable for a specific combo of Qual variables(keys)
    PBS %>%
      dplyr::filter(
        Concession == "General",
        ATC1 == "A",
        ATC2 == "A10"
      ) %>%
      gf_line(Scripts ~ Month,
        colour = ~Type,
        data = .
      ) %>%
      gf_point(title = "Scripts, per Month for General/A/A10 category patients")

    As can be seen, very different time patterns based on the two Types of payment methods, and also with Costs and Scripts. Strongly seasonal for both, with seasonal variation increasing over the years, a clear sign of a multiplicative time series. There is a strong upward trend with both types of subsidies, Safety net and Co-payments. But these trends are somewhat different in magnitude for specific combinations of ATC1 and ATC2 categories.

    We can use tsibble’s dplyr-like commands to develop summaries by year, quarter, month(original data): Look carefully at the new time variable created each time:

    # Original Data
    PBS
    # Cost Summary by Month, which is the original data
    # Only grouping happens here
    # New Variable Name to make grouping visible
    PBS %>%
      tsibble::group_by_key(ATC1, ATC2, Concession, Type) %>%
      tsibble::index_by(Month_Group = Month) %>%
      dplyr::summarise(across(
        .cols = c(Cost, Scripts),
        .fn = mean,
        .names = "mean_{.col}"
      ))
    # Cost Summary by Quarter
    PBS %>%
      tsibble::group_by_key(ATC1, ATC2, Concession, Type) %>%
      tsibble::index_by(Year_Quarter = yearquarter(Month)) %>% # And the change here!
      dplyr::summarise(across(
        .cols = c(Cost, Scripts),
        .fn = mean,
        .names = "mean_{.col}"
      ))
    # Cost Summary by Year
    PBS %>%
      tsibble::group_by_key(ATC1, ATC2, Concession, Type) %>%
      index_by(Year_Group = year(Month)) %>% # Note this change!!!
      dplyr::summarise(across(
        .cols = c(Cost, Scripts),
        .fn = mean,
        .names = "mean_{.col}"
      ))

    Finally, it may be a good idea to convert some tibble into a tsibble to leverage some of functions that tsibble offers:

    births_tsibble <- births_2000_2014 %>%
      mutate(date = lubridate::make_date(
        year = year,
        month = month,
        day = date_of_month
      )) %>%
      # Convert to tsibble
      tsibble::as_tsibble(index = date) # Time Variable
    
    births_tsibble

    This is DAILY data of course. Let us say we want to group by month and plot mean monthly births as before, but now using tsibble and the index variable:

    • tsibble vs timetk: Basic Plot
    • tsibble vs timetk: Grouped Plot 1
    • tsibble vs timetk: Grouped Plot 2
    births_tsibble %>%
      gf_line(births ~ date,
        data = .,
        title = "Basic tsibble plotted with ggformula"
      )
    # timetk **can** plot tsibbles.
    births_tsibble %>%
      timetk::plot_time_series(
        .date_var = date,
        .value = births,
        .title = "Tsibble Plotted with timetk"
      )

    births_tsibble %>%
      tsibble::index_by(month_index = ~ tsibble::yearmonth(.)) %>%
      dplyr::summarise(mean_births = mean(births, na.rm = TRUE)) %>%
      gf_point(mean_births ~ month_index,
        data = .,
        title = "Monthly Aggregate with tsibble"
      ) %>%
      gf_line() %>%
      gf_smooth(se = FALSE, method = "loess")
    births_timeseries %>%
      # timetk cannot wrangle tsibbles
      # timetk needs tibble or data frame
      timetk::summarise_by_time(
        .date_var = date,
        .by = "month",
        mean = mean(births)
      ) %>%
      timetk::plot_time_series(date, mean,
        .title = "Monthly aggregate births with timetk",
        .x_lab = "year",
        .y_lab = "Mean Monthly Births"
      )

    Apart from the bump during in 2006-2007, there are also seasonal trends that repeat each year, which we glimpsed earlier.

    births_tsibble %>%
      tsibble::index_by(year_index = ~ lubridate::year(.)) %>%
      dplyr::summarise(mean_births = mean(births, na.rm = TRUE)) %>%
      gf_point(mean_births ~ year_index, data = .) %>%
      gf_line() %>%
      gf_smooth(se = FALSE, method = "loess")
    births_timeseries %>%
      timetk::summarise_by_time(
        .date_var = date,
        .by = "year",
        mean = mean(births)
      ) %>%
      timetk::plot_time_series(date, mean,
        .title = "Yearly aggregate births with timetk",
        .x_lab = "year",
        .y_lab = "Mean Yearly Births"
      )

    Candle-Stick Plots

    Hmm…can we try to plot boxplots over time (Candle-Stick Plots)? Over month / quarter or year?

    Monthly Box Plots

    births_tsibble %>%
      index_by(month_index = ~ yearmonth(.)) %>%
      # 15 years
      # No need to summarise, since we want boxplots per year / month
      gf_boxplot(births ~ date,
        group = ~month_index,
        fill = ~month_index, data = .
      )
    # plot the groups
    # 180 plots!!
    
    births_timeseries %>%
      # timetk::summarise_by_time(.date_var = date,
      #                           .by = "month",
      #                           mean = mean(births)) %>%
      timetk::plot_time_series_boxplot(date, births,
        .title = "Monthly births with timetk",
        .x_lab = "year", .period = "month",
        .y_lab = "Mean Monthly Births"
      )

    Quarterly boxplots

    births_tsibble %>%
      index_by(qrtr_index = ~ yearquarter(.)) %>% # 60 quarters over 15 years
      # No need to summarise, since we want boxplots per year / month
      gf_boxplot(births ~ date,
        group = ~qrtr_index,
        fill = ~qrtr_index,
        data = .
      ) # 60 plots!!

    births_timeseries %>%
      timetk::plot_time_series_boxplot(date, births,
        .title = "Quarterly births with timetk",
        .x_lab = "year", .period = "quarter",
        .y_lab = "Mean Monthly Births"
      )

    Yearwise boxplots

    births_tsibble %>%
      index_by(year_index = ~ lubridate::year(.)) %>% # 15 years, 15 groups
      # No need to summarise, since we want boxplots per year / month
    
      gf_boxplot(births ~ date,
        group = ~year_index,
        fill = ~year_index,
        data = .
      ) %>% # plot the groups 15 plots
      gf_labs(title = "Yearly aggregate births with ggformula") %>%
      gf_theme(scale_fill_distiller(palette = "Spectral"))

    births_timeseries %>%
      timetk::plot_time_series_boxplot(date, births,
        .title = "Yearly aggregate births with timetk",
        .x_lab = "year", .period = "year",
        .y_lab = "Births"
      )

    Although the graphs are very busy, they do reveal seasonality trends at different periods.

    How about a heatmap? We can cook up a categorical variable based on the number of births (low, fine, high) and use that to create a heatmap:

    births_2000_2014 %>%
      mutate(birthrate = case_when(
        births >= 10000 ~ "high",
        births <= 8000 ~ "low",
        TRUE ~ "fine"
      )) %>%
      gf_tile(
        data = .,
        year ~ month,
        fill = ~birthrate,
        color = "black"
      ) %>%
      gf_theme(scale_x_time(
        breaks = 1:12,
        labels = c(
          "Jan", "Feb", "Mar", "Apr",
          "May", "Jun", "Jul", "Aug",
          "Sep", "Oct", "Nov", "Dec"
        )
      )) %>%
      gf_theme(theme_classic())

    Conclusion

    We have seen a good few data formats for time series, and how to work with them and plot them. We have also seen how to decompose time series into periodic and aperiodic components, which can be used to make business decisions.

    Your Turn

    1. Choose some of the datasets in the tsdl and in the tsibbledata packages. Plot basic, filtered and model-based graphs for these and interpret.

    References

    1. Robert Hyndman, Forecasting: Principles and Practice (Third Edition). available online

    2. Time Series Analysis at Our Coding Club

    Readings

    1. The Nuclear Threatβ€”The Shadow Peace, part 1

    2. 11 Ways to Visualize Changes Over Time – A Guide

    3. What is seasonal adjustment and why is it used?

    4. The start-at-zero rule


    Extra Stuff

    Using tsbox and TSstudio

    Let us create a time variable in our dataset now:

    • tsbox::ts_plot needs just the date and the births columns to plot with and not be confused by the other numerical columns, so let us create a single date column from these three, but retain them for now.
    • TSstudio::ts_plot also needs a date column.

    So there are several numerical variables for year, month, and day_of_month, day_of_week, and of course the births on a daily basis.

    We use the lubridate package from the tidyverse:

    births_timeseries <-
      births_2000_2014 %>%
      mutate(date = lubridate::make_date(
        year = year,
        month = month,
        day = date_of_month
      )) %>%
      select(date, births, year, month, date_of_month, day_of_week)
    
    births_timeseries
    TipExtract from help(tsbox)

    In data frames, i.e., in a data.frame, a data.table, or a tibble, tsbox stores one or multiple time series in the β€˜long’ format. tsbox detects a value, a time column, and zero, one or several id columns. Column detection is done in the following order:

    • Starting on the right, the first first numeric or integer column is used as value column.
    • Using the remaining columns and starting on the right again, the first Date, POSIXct, numeric or character column is used as time column. character strings are parsed by anytime::anytime(). The timestamp, time, indicates the beginning of a period.
    • All remaining columns are id columns. Each unique combination of id columns points to a (unique) time series.

    Alternatively, the time column and the value column to be explicitly named as time and value. If explicit names are used, the column order will be ignored. If columns are detected automatically, a message is returned.

    Plotting this directly, after selecting the relevant variables, so that they will be auto-detected:

    births_timeseries %>%
      select(date, births) %>%
      tsbox::ts_plot()
    [time]: 'date' [value]: 'births' 

    births_timeseries %>%
      select(date, births) %>%
      TSstudio::ts_plot(
        Xtitle = "Year",
        Ytitle = "Births",
        title = "Births Time Series",
        Xgrid = TRUE, Ygrid = TRUE,
        slider = TRUE,
        width = 1
      ) # linewidth

    Quite messy, as before. We need use the summarised data, as before. We will do this in the next section.

    References

    R Package Citations
    Package Version Citation
    ggridges 0.5.6 Wilke (2024)
    NHANES 2.1.0 Pruim (2015)
    TeachHist 0.2.1 Lange (2023)
    TeachingDemos 2.13 Snow (2024)
    Lange, Carsten. 2023. TeachHist: A Collection of Amended Histograms Designed for Teaching Statistics. https://doi.org/10.32614/CRAN.package.TeachHist.
    Pruim, Randall. 2015. NHANES: Data from the US National Health and Nutrition Examination Study. https://doi.org/10.32614/CRAN.package.NHANES.
    Snow, Greg. 2024. TeachingDemos: Demonstrations for Teaching and Learning. https://doi.org/10.32614/CRAN.package.TeachingDemos.
    Wilke, Claus O. 2024. ggridges: Ridgeline Plots in β€œggplot2”. https://doi.org/10.32614/CRAN.package.ggridges.
    Back to top

    Footnotes

    1. https://www.gapminder.org/data/β†©οΈŽ

    Citation

    BibTeX citation:
    @online{v2022,
      author = {V, Arvind},
      title = {πŸ•” {Time} {Series}},
      date = {2022-12-15},
      url = {https://av-quarto.netlify.app/content/courses/Analytics/Descriptive/Modules/50-Time/files/time-interactive.html},
      langid = {en},
      abstract = {Events, Trends, Seasons, and Changes over Time}
    }
    
    For attribution, please cite this work as:
    V, Arvind. 2022. β€œπŸ•” Time Series.” December 15, 2022. https://av-quarto.netlify.app/content/courses/Analytics/Descriptive/Modules/50-Time/files/time-interactive.html.

    License: CC BY-SA 2.0

    Website made with ❀️ and Quarto, by Arvind V.

    Hosted by Netlify .